1. Did Spotify users in the Netherlands change their music listening behavior during the COVID-19 pandemic?

The sound of COVID-19: Spotify usage in the Netherlands during a pandemic

The COVID-19 pandemic has stirred society up by quite large margin. Many people are (in)directly affected by the health crisis or the resulting governmental measures. This led to adjustments, e.g. social distancing and isolation, causing society to change communication, work and more aspects of daily life. This dashboard will explore the following:

Did Spotify users in the Netherlands change their music listening behavior during the COVID-19 pandemic?

A corpus has been created in order to perform various computational musicological analyses using the spotifyr and compmus packages.

The general listening behavior of Spotify users in the Netherlands before and during the pandemic will be explored, as measured by the Spotify API. In addition, specific events related to the pandemic (e.g. lockdown and curfew) will be considered as well to find to what extent possible changes in listening behavior can be attributed to these events.

Corpus

In order to analyze general listening behavior, the most important variables for the portfolio are:


In order to keep track on the average listening behavior of Dutch Spotify users, the weekly ‘Top 50’ playlists from the Netherlands will be analyzed over time. The years 2019 (52 weeks) and 2020 (53 weeks) and will be measured in its entirety, and 2021 is measured until week 7.

2019 contains 52 playlists consisting of 50 tracks per playlist
2020 contains 53 playlists consisting of 50 tracks per playlist
2021 contains 7 playlists consisting of 50 tracks per playlist Totaling 5600 observations/tracks. As a track can be in the charts for multiple weeks, duplicates occur. The number of unique tracks within the corpus is 826.

Since Spotify autoupdates their playlists, the historical ‘Top 50’ lists in the form of CSV files will be retrieved from Spotify Charts.

The changes of (or lack thereof) listening behavior will be measured by the the different Spotify Audio Features:

• danceability

Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

• energy

Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

• key

The key the track is in. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on.

• loudness

The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.

• mode

Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

• speechiness

Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

• acousticness

A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

• instrumentalness

Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

• liveness

Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

• tempo

The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

• valence

A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

• duration_ms

The duration of the track in milliseconds.

Also the following variables obtained through the Spotify API will be included:

  • Number of streams

  • Position

  • Track Name

  • Artist

  • Streams

The variable time will be used to identify the different weeks as well as the periods before and during the pandemic that may explain the changes in music listening behavior from the top and viral playlists.

In addition interesting annual periods will be isolated to see if similar patterns reoccur during the pandemic. For example, the December Holiday season before and during the pandemic will be analyzed to identify whether Spotify users altered their Christmas related listening behavior.

  • Week
  • Year

The corpus measures from week 1, 2019 till week 7, 2021 and will split the data into two periods. A period Before the pandemic and During the pandemic. This will make it clearer to attribute analyses to these periods, rather than annually or weekly.

Alongside the musical analyses, statistics concerning COVID-19 will be taken into account as well. The used data is provided by the The Dutch National Institute for Public Health and the Environment (RIVM). The data has been pre-processed to include both weekly and cumulative data. The variables that are included in this dashboard are the following variables:

  • Number Hospital Admissions

  • Number of Deaths

  • Reported cases of COVID-19

3. Trip down memory lane:
Comparing pre-pandemic to intra-pandemic listening behavior



In this frame you make week-for-week comparisons for different variables based on the period before or during the pandemic.

The interesting variables to compare the different periods are valence and energy as these reflect the valance/arousal model that shows the emotions Happy, Angry, Sad, Relaxed.

The corpus is spread fairly evenly, and both before and during the pandemic most of the Top 50 tracks are in the Happy quadrant. This is not very surprising, as the Top 50 usually mostly consist of pop songs.

In the weekly plots it is observed that the average valence is much higher during the pandemic. And that the average weekly streams follow a similar trend, with some anomalies. These will be explored further down the line.

5. 17 Miljoen Mensen vs. 15 Miljoen Mensen - The prominent cover song during the pandemic shows little similarity with original


DTW and Chromagrams

The track “17 Miljoen Mensen” (2020) is a cover of “15 Miljoen Mensen” (1996). An analysis of the chromafeatures of the two tracks aims find similarities between them. Notice the d 17 Miljoen mensen’s title adjustment for the population increase of 2 million people, and its shortness with a duration of just 1 minute and 47 seconds. But what are other differences or similarities?

The first plot shows the Dynamic Time Warping plot of the two tracks, using Euclidean norm and angular distance. A diagonal pattern would denote similarity between the two tracks. This is not observed, which implies significant differences between the two tracks. This is supported as the the table shows that the pitch classes differ. According to the Spotify API, “17 Miljoen Mensen” is in the key of G major, wheras “15 Miljoen Mensen” is in the key of C major. This is not explicitly shown, but they are represented in their respective chromagrams.

In addition, the ‘sound and feel’ of the tracks differ: 15 miljoen mensen has a higher danceability, energy, and loudness, whereas “17 miljoen mensen” has a much higher acousticness and liveness (due to the recording being a live performance).

A remarkable commonality probably explains the differences: Both tracks were unintended single releases, “15 miljoen mensen” was initially written for a commercial, and “17 Miljoen mensen” as a tribute for a (due to COVID-19) canceled music concert. The different motivations behind the tracks reflects the different ‘sound and feel’ as shown by Spotify API.

For a commercial you would want a more catchy/upbeat track, contrary to a song related to a disaster or crisis. This explains the difference in loundness, “15 Miljoen Mensen” has a loudness of -10.041dB, wheras “15 Miljoen Mensen” has a loudness of -7.063dB.

Spotify Features Table

17 Miljoen Mensen (2020) 15 Miljoen Mensen (1996)
danceability 0.493 0.547
energy 0.321 0.631
key 7 0
loudness -10.041 -7.063
mode 1 1
speechiness 0.0402 0.0266
acousticness 0.715 0.0943
instrumentalness 0 0
liveness 0.0863 0.0548
valence 0.508 0.481
tempo 86.77 79.02
duration_sec 107.2 236.107
time_signature 4 4

Error: Embedded data could not be displayed.

6. All I Want before Christmas… is Christmas
Earlier Christmas in 2020 due to the lockdown.


Christmas songs started to dominate the charts in 2020 from around week 49 until week 53, whereas in 2019 Christmas this phenomenon occurred a bit later. In 2020 it is noticeable that the bottom right corner contain tracks with relatively high BPM, high valence, lower energy and lower danceability.During these weeks Christmas tracks dominated the charts. In 2019 this phenomenon is very noticeable in week 52, but shows that Christmas slowly started in week 50. Also in 2020, the charts remained similar during the holiday period from week 50 to 53, whereas in 2019 week 52 saw a spike of the Christmas related audio features. This pattern implies more Christmas tracks entered the Top 50.

Interestingly, Mariah Carey’s ‘All I Want for Christmas’ topped the charts for four consecutive weeks in 2020, as opposed to 1 week in 2019.

A Possible explanation is that due to the imposed lockdown and other restrictions, people may have felt a need or desire for the “Christmas Spirit/Vibes” a week earlier than in 2019.

Another interesting discovery is that similar to 2019, the top streams in 2020 decreased in similar fashion. A possible explanation is that people disregarded the lockdown regulations and spent the holiday season with friends and/or family or were preoccupied with other activities to keep in touch with them.

7. Self-Similarity Matrix: “Dance Monkey” Shows repeating pattern and noticeably distinct Millennial Whoop


Dance Monkey

“Dance Monkey” by Tones And I is one of the most popular tracks within the corpus. A structure analysis will show possible patterns of sequences within the track and their relation.

Cepstrogram

The first cepstrogram plot shows the magnitude of each timbre feature per segment of the track. The feature c01 is loudness, c02 is low frequency, c03 is mid frequencies. c04 and up are not defined as straight forward, but they may be implied by keeping track of changes within a track during specific segments. The cepstrogram shows that "Dance Monkey’s timbre features are relatively more defined by c01 to c05.

  • c01 Loudness: The segments reflect the loudness of the track, this is especially noted during the final chorus.
  • c02 Darkness: The segments faintly show a higher magnitude when the bass drum hits. But its omission is noted much clearly during the breakdown starting at 150 seconds.
  • c03 Mid frequency: It’s shown at about 50 seconds and 165 seconds when higher notes are less and more distinct respectively.
  • c04 Attack: This is very prevalent during the intro (vocal stretch fade-in sfx).
  • c05 [Unknown]: It has the highest magnitude at around 150 seconds, noticeable is the loudness of the “Millenial Whoop”.
Self Similarity

The second and third plots are Self Similarity Matrices (SSM); The first being pitch, and the second timbre. These plots show the structure of a track by denoting patterns of similarities that reoccur. Diagonal lines and a checkerboard pattern show similarity and repetition.

The timbre SSM is plotted using Euclidean norm, Euclidean distance and summarized by the mean. The plot shows a faint checkerboard pattern which implies some form of repetition in the track. At the 150 second mark there is a significant timbre difference. This is when the breakdown occurs with the earlier mentioned “Millenial Whoop”.

The pitch SSM is plotted using Euclidean norm, cosine distance and summarized by root mean square. This plot shows a slightly more noticeable checkerboard pattern. At the 150 second mark, the plot shows a significant change.


Error: Embedded data could not be displayed.

8. In the mood for which keys?
Chord and Key estimations for “Mood”.


The track Mood by 24kGoldn ft. iann diorr is also one of the identified popular tracks in the corpus. A keygram and chordogram are plotted in order to show the tonal progression of the track by estimating the chords and key for each segment.

The keygram shows that the key E♭ major, G minor, F major, C major, G major, and C♯minor are prevalent keys during the track. The Chordogram show that the chords C minor, E♭ 7, and E♭ major are the most prevalent chords of the track.

Spotify API

According to the Spotify API, this track is written in the 7th key, with mode 0: meaning G minor.

Chordify

The Chordify algorithm identified the chords within the following (4/4) loop:

E♭ - Gm - | B♭ - F - |

The identified key appears to be on the natural scale:

G - A - B♭ - C - D - E♭ - F

The differences of the found/estimated chords are due to the different Audio Chord Estimation algorithms the Spotify API and Chordify uses. While the Chordify is not considered perfect, it is considered ‘good enough’ to be useful. There are differences, but as seen above, there is some overlap between the API’s.


Error: Embedded data could not be displayed.

9. Histogram of Keys within the corpus shows C♯ as the most common key.


While histogram doesn’t show a clear/unanimous preference, the keys C♯, F♯, G♯ consistently do have a relative high count within the corpus.

In 2019, There is a clear significant higher count of C♯, F, G, B keys.

In 2020, The keys C♯, F♯, G♯, B have a significantly higher frequency in the corpus.

Note that 2021 only contains the first 7 weeks, whereas 2019 and 2020 contain 52 and 53 weeks respectively. Therefore, its not very representative to make the most informed comparisons.

10. Most prevalent beats:
Tempi around 100 BPM and 120 BPM most common in corpus.


The density plot shows that overall the the most frequent tempi within the corpus is around 90-100 BPM and 115-128 BPM. The year 2019 showed a strong preference for tracks around 98 BPM and to a lesser extent 123 BPM.

The year 2020 showed a strong preference for tracks around both 95 BPM and 121 BPM.

The first 7 weeks of 2021 showed a preference for tracks around 99 BPM and 122 BPM.


Average Tempo:

  • 117

    BPM

  • 116

    BPM

  • 121

    BPM

Press one of these buttons to display the average BPM

11. “Tigers” by Bilal Wahib has a tempo of 112 BPM


Another popular track in the corpus is “Tigers” by Bilal Wahib. Tempograms are plotted in order to show the estimated BPM of the track along its duration.

The tempo feature of the Spotify API estimates a BPM of 111.943 (rounded 112 BPM).

The first Tempogram doens’t explicitly reflect the estimation of the Spotify API, tempi of around 210-220 and 430-450 are shown in the plot. The plot might record the represent half-time and quarter time BPM’s of 224 and 448 (based on the estimation of 112 BPM).

The second Tempogram (cyclic), is adjusted to represent the more ‘common’ tempi at which humans tap. This plot does reflect the Spotify API estimation of 112 BPM more clearly.

At the 75 second mark, there is a slight drop and increase in tempo. From this point, noticeable is the tape stop sound effect, which is immediately followed by the bridge. The BPM however, remains the same (try to tap along).


Error: Embedded data could not be displayed.

12. Trees and Neighbors: Applying machine learning model on corpus show there is a difference between the pre-pandemic and intra-pandemic periods.


The data has been heavily shrunk in order to run the machine learning algorithms without crashing. A subset of the data containing the top 3 tracks per week are selected. This totals 336 observations that will be split across the pandemic period. This makes is possible for the algorithm to predict whether a track belongs either in the period prior COVID-19 or during. A ten fold cross validation is used. The algorithm is very accurate on both knn and tree methods having a precision and recall above 90%

Confusion matrix

The Confusion matrix show the accuracy of the algorithm predicting classifiers. In this portfolio we’d like to find out wheter we can classify songs that belong to in the period prior or during the pandemic. Looking at the Truth and Prediction, the algorithm performs very well on the subset data.

Random forest

The random forest ranks the importance of the different features that can be attributed to the classification of the songs. In this case we find that the variables Streams, G, c06, G#|Ab, B are the most important in this corpus.

knn precision - recall

class precision recall
Before COVID-19 0.9607843 0.9245283
During COVID-19 0.9344262 0.9661017

tree precision - recall

class precision recall
Before COVID-19 0.9320388 0.9056604
During COVID-19 0.9173554 0.9406780

End of the road: Long term pandemic does have impact, but short term. Not enough to change listening behavior.

Conclusion

We’ve seen that the pandemic did have a significant effect on society. This was also reflected in the spikes of the number of streams for specific tracks, explicitly shown in the earlier plots by the following. - Togetherness and solidarity at the beginning of the pandemic - Earlier and longer Christmas

We’ve seen that Dutch spotify users have a relative quick reaction time for a short period, this holds for the periods before and during the pandemic. This was clearly seen in the Eurovision and 17 Miljoen Mensen examples. This has hasn’t changed, this is also reflected that the subsequent coronawaves did not meet similar solidarity as the first.

However on average, the number streams remained fairly stable. Although similar to prior to the pandemic, we did find that people listened a bit more to ‘happier’ music. Interestingly this phenomenon combined with the fact the people tend to be “coronamoe/corona fatigueexplains that people in the Netherlands are happier despite the pandemic.

All in all it can be concluded that the pandemic did have an impact, but at the same time the Dutch are quick to move on.

I hope that the pandemic will be behind us in the near future, which allows us to analyze the post pandemic music listening behavior!